Overview

Dataset info

Number of variables6
Number of observations2935849
Missing cells0 (0.0%)
Duplicate rows6 (< 0.1%)
Total size in memory134.4 MiB
Average record size in memory48.0 B

Variables types

Numeric5
Categorical1
Boolean0
Date0
URL0
Text (Unique)0
Rejected0
Unsupported0

Warnings

Dataset has 6 (< 0.1%) duplicate rows Warning
date only contains datetime values, but is categorical. Consider applying pd.to_datetime()Type
date has a high cardinality: 1034 distinct values Warning
date_block_num has 115690 (3.9%) zeros Zeros
item_cnt_day is highly skewed (γ1 = 272.8331617) Skewed

Variables

date
Categorical

Distinct count1034
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
28.12.2013
 
9434
29.12.2013
 
9335
30.12.2014
 
9324
Other values (1031)
2907756
ValueCountFrequency (%) 
28.12.2013 9434 0.3%
 
29.12.2013 9335 0.3%
 
30.12.2014 9324 0.3%
 
30.12.2013 9138 0.3%
 
31.12.2014 8347 0.3%
 
27.12.2014 8041 0.3%
 
31.12.2013 7765 0.3%
 
23.02.2013 7577 0.3%
 
28.12.2014 7370 0.3%
 
21.12.2013 6773 0.2%
 
Other values (1024) 2852745 97.2%
 
Max length10
Mean length10
Min length10
Contains charsFalse
Contains digitsTrue
Contains spacesFalse
Contains non-wordsTrue

date_block_num
Numeric

Distinct count34
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean14.56991146
Minimum0
Maximum33
Zeros (%)3.9%
Mini histogram

Quantile statistics

Minimum0
5-th percentile1
Q17
Median14
Q323
95-th percentile31
Maximum33
Range33
Interquartile range16

Descriptive statistics

Standard deviation9.422987709
Coef of variation0.6467429629
Kurtosis-1.082868996
Mean14.56991146
MAD8.119015654
Skewness0.2038579466
Sum42775060
Variance88.79269736
Memory size22.4 MiB
Histogram
Histogram with fixed size bins (bins=34)
Histogram
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 3.5 ... 29.5 30.5 31.5 32.5 33. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
11 143246 4.9%
 
23 130786 4.5%
 
2 121347 4.1%
 
0 115690 3.9%
 
1 108613 3.7%
 
7 104772 3.6%
 
6 100548 3.4%
 
5 100403 3.4%
 
12 99349 3.4%
 
10 96736 3.3%
 
Other values (24) 1814359 61.8%
 

Minimum 5 values

ValueCountFrequency (%) 
0 115690 3.9%
 
1 108613 3.7%
 
2 121347 4.1%
 
3 94109 3.2%
 
4 91759 3.1%
 

Maximum 5 values

ValueCountFrequency (%) 
33 53514 1.8%
 
32 50588 1.7%
 
31 57029 1.9%
 
30 55549 1.9%
 
29 54617 1.9%
 

item_cnt_day
Numeric

Distinct count198
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean1.242640885
Minimum-22
Maximum2169
Zeros (%)0.0%
Mini histogram

Quantile statistics

Minimum-22
5-th percentile1
Q11
Median1
Q31
95-th percentile2
Maximum2169
Range2191
Interquartile range0

Descriptive statistics

Standard deviation2.618834431
Coef of variation2.107474864
Kurtosis177478.0988
Mean1.242640885
MAD0.4459868445
Skewness272.8331617
Sum3648206
Variance6.858293776
Memory size22.4 MiB
Histogram
Histogram with fixed size bins (bins=50)
Histogram
Histogram with variable size bins (bins=[-2.200e+01 -5.500e+00 -2.500e+00 -1.500e+00 0.000e+00 ... 1.105e+02 1.565e+02 2.595e+02 6.530e+02 2.169e+03], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
1 2629372 89.6%
 
2 194201 6.6%
 
3 47350 1.6%
 
4 19685 0.7%
 
5 10474 0.4%
 
-1 7252 0.2%
 
6 6338 0.2%
 
7 4057 0.1%
 
8 2903 0.1%
 
9 2177 0.1%
 
Other values (188) 12040 0.4%
 

Minimum 5 values

ValueCountFrequency (%) 
-22 1 < 0.1%
 
-16 1 < 0.1%
 
-9 1 < 0.1%
 
-6 2 < 0.1%
 
-5 4 < 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
2169 1 < 0.1%
 
1000 1 < 0.1%
 
669 1 < 0.1%
 
637 1 < 0.1%
 
624 1 < 0.1%
 

item_id
Numeric

Distinct count21807
Unique (%)0.7%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean10197.22706
Minimum0
Maximum22169
Zeros (%)< 0.1%
Mini histogram

Quantile statistics

Minimum0
5-th percentile1540
Q14476
Median9343
Q315684
95-th percentile20949
Maximum22169
Range22169
Interquartile range11208

Descriptive statistics

Standard deviation6324.297354
Coef of variation0.6201977575
Kurtosis-1.225209966
Mean10197.22706
MAD5579.673443
Skewness0.2571735482
Sum2.993751886e+10
Variance39996737.02
Memory size22.4 MiB
Histogram
Histogram with fixed size bins (bins=50)
Histogram
Histogram with variable size bins (bins=[ 0. 26.5 27.5 28.5 29.5 ... 22164.5 22165.5 22166.5 22167.5 22169. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
20949 31340 1.1%
 
5822 9408 0.3%
 
17717 9067 0.3%
 
2808 7479 0.3%
 
4181 6853 0.2%
 
7856 6602 0.2%
 
3732 6475 0.2%
 
2308 6320 0.2%
 
4870 5811 0.2%
 
3734 5805 0.2%
 
Other values (21797) 2840689 96.8%
 

Minimum 5 values

ValueCountFrequency (%) 
0 1 < 0.1%
 
1 6 < 0.1%
 
2 2 < 0.1%
 
3 2 < 0.1%
 
4 1 < 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
22169 1 < 0.1%
 
22168 6 < 0.1%
 
22167 1114 < 0.1%
 
22166 270 < 0.1%
 
22165 2 < 0.1%
 

item_price
Numeric

Distinct count19993
Unique (%)0.7%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean890.8532327
Minimum-1
Maximum307980
Zeros (%)0.0%
Mini histogram

Quantile statistics

Minimum-1
5-th percentile99
Q1249
Median399
Q3999
95-th percentile2690
Maximum307980
Range307981
Interquartile range750

Descriptive statistics

Standard deviation1729.799631
Coef of variation1.941733573
Kurtosis445.5328258
Mean890.8532327
MAD769.9530494
Skewness10.7504227
Sum2615410572
Variance2992206.762
Memory size22.4 MiB
Histogram
Histogram with fixed size bins (bins=50)
Histogram
Histogram with variable size bins (bins=[-1.00000000e+00 9.50000000e-02 1.50000000e-01 3.50000000e-01 7.04356846e-01 ... 3.27400000e+04 3.29937500e+04 3.59905000e+04 4.63860000e+04 3.07980000e+05], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
299 291352 9.9%
 
399 242603 8.3%
 
149 218432 7.4%
 
199 184044 6.3%
 
349 101461 3.5%
 
599 95673 3.3%
 
999 82784 2.8%
 
799 77882 2.7%
 
249 77685 2.6%
 
699 76493 2.6%
 
Other values (19983) 1487440 50.7%
 

Minimum 5 values

ValueCountFrequency (%) 
-1 1 < 0.1%
 
0.07 2 < 0.1%
 
0.0875 1 < 0.1%
 
0.09 1 < 0.1%
 
0.1 2932 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
307980 1 < 0.1%
 
59200 1 < 0.1%
 
50999 1 < 0.1%
 
49782 1 < 0.1%
 
42990 4 < 0.1%
 

shop_id
Numeric

Distinct count60
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean33.00172829
Minimum0
Maximum59
Zeros (%)0.3%
Mini histogram

Quantile statistics

Minimum0
5-th percentile6
Q122
Median31
Q347
95-th percentile57
Maximum59
Range59
Interquartile range25

Descriptive statistics

Standard deviation16.22697305
Coef of variation0.4917007044
Kurtosis-1.025358056
Mean33.00172829
MAD13.83000446
Skewness-0.07236142921
Sum96888091
Variance263.3146543
Memory size22.4 MiB
Histogram
Histogram with fixed size bins (bins=50)
Histogram
Histogram with variable size bins (bins=[ 0. 0.5 1.5 3.5 5.5 ... 55.5 56.5 57.5 58.5 59. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
31 235636 8.0%
 
25 186104 6.3%
 
54 143480 4.9%
 
28 142234 4.8%
 
57 117428 4.0%
 
42 109253 3.7%
 
27 105366 3.6%
 
6 82663 2.8%
 
58 71441 2.4%
 
56 69573 2.4%
 
Other values (50) 1672671 57.0%
 

Minimum 5 values

ValueCountFrequency (%) 
0 9857 0.3%
 
1 5678 0.2%
 
2 25991 0.9%
 
3 25532 0.9%
 
4 38242 1.3%
 

Maximum 5 values

ValueCountFrequency (%) 
59 42108 1.4%
 
58 71441 2.4%
 
57 117428 4.0%
 
56 69573 2.4%
 
55 34769 1.2%
 

Correlations

Missing values

Sample

First rows

datedate_block_numitem_cnt_dayitem_iditem_priceshop_id
002.01.201301.022154999.0059
103.01.201301.02552899.0025
205.01.20130-1.02552899.0025
306.01.201301.025541709.0525
415.01.201301.025551099.0025
510.01.201301.02564349.0025
602.01.201301.02565549.0025
704.01.201301.02572239.0025
811.01.201301.02572299.0025
903.01.201303.02573299.0025

Last rows

datedate_block_numitem_cnt_dayitem_iditem_priceshop_id
293583924.10.2015331.07315399.025
293584031.10.2015331.07409299.025
293584111.10.2015331.07393349.025
293584210.10.2015331.07384749.025
293584309.10.2015331.07409299.025
293584410.10.2015331.07409299.025
293584509.10.2015331.07460299.025
293584614.10.2015331.07459349.025
293584722.10.2015331.07440299.025
293584803.10.2015331.07460299.025